Ludwig - Papers: Computer Science - Computation and Language

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Tianyu Fu, et al. • (2025) • DOI: 10.48550/arXiv.2510.03215

Multi-LLM systems harness the complementary strengths of diverse Large Language Models, achieving performance and efficiency gains unattainable by a single model. In existing designs, LLMs communicate...

Large Language Models as Computable Approximations to Solomonoff Induction

Jun Wan, Lingrui Mei • (2025) • DOI: 10.48550/arXiv.2505.15784

The rapid advancement of large language models (LLMs) calls for a rigorous theoretical framework to explain their empirical success. While significant progress has been made in understanding LLM behav...

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

Runjin Chen, et al. • (2025) • DOI: 10.48550/arXiv.2507.21509

Large language models interact with users through a simulated 'Assistant' persona. While the Assistant is typically trained to be helpful, harmless, and honest, it sometimes deviates from these ideals...

LLM4Decompile: Decompiling Binary Code with Large Language Models

Hanzhuo Tan, et al. • (2024) • DOI: 10.48550/arXiv.2403.05286

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in La...

Large Language Models and Emergence: A Complex Systems Perspective

David C. Krakauer, et al. • (2025) • DOI: 10.48550/arXiv.2506.11135

Emergence is a concept in complexity science that describes how many-body systems manifest novel higher-level properties, properties that can be described by replacing high-dimensional mechanisms with...

Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, et al. • (2025) • DOI: 10.48550/arXiv.2402.18668

Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is...

Magistral

Mistral-AI, et al. • • (2025) • DOI: 10.48550/arXiv.2506.10910

We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior mod...

Reinforcement Pre-Training

Qingxiu Dong, et al. • • (2025) • DOI: 10.48550/arXiv.2506.08007

In this work, we introduce Reinforcement Pre-Training (RPT) as a new scaling paradigm for large language models and reinforcement learning (RL). Specifically, we reframe next-token prediction as a rea...

Qwen3 Technical Report

An Yang, et al. • • (2025) • DOI: 10.48550/arXiv.2505.09388

In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capa...

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

Adam Roberts, et al. • • (2020) • DOI: 10.48550/arXiv.2002.08910

It has recently been observed that neural language models trained on unstructured text can implicitly store and retrieve knowledge using natural language queries. In this short paper, we measure the p...

Language Models use Lookbacks to Track Beliefs

Nikhil Prakash, et al. • • (2025) • DOI: 10.48550/arXiv.2505.14685

How do language models (LMs) represent characters' beliefs, especially when those beliefs may differ from reality? This question lies at the heart of understanding the Theory of Mind (ToM) capabilitie...

The Diffusion Duality

Subham Sekhar Sahoo, et al. • • (2025) • DOI: 10.48550/arXiv.2506.10892

Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive models and ma...

Chain-of-Thought Reasoning is a Policy Improvement Operator

Hugh Zhang, David C. Parkes • • (2023) • DOI: 10.48550/arXiv.2309.08589

Large language models have astounded the world with fascinating new capabilities. However, they currently lack the ability to teach themselves new skills, relying instead on large amounts of human-gen...

Reasoning with Language Model is Planning with World Model

Shibo Hao, et al. • • (2023) • DOI: 10.48550/arXiv.2305.14992

Large language models (LLMs) have shown remarkable reasoning capabilities, especially when prompted to generate intermediate reasoning steps (e.g., Chain-of-Thought, CoT). However, LLMs can still stru...

How much do language models memorize?

John X. Morris, et al. • • (2025) • DOI: 10.48550/arXiv.2505.24832

We propose a new method for estimating how much a model ``knows'' about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have stru...

BPE Stays on SCRIPT: Structured Encoding for Robust Multilingual Pretokenization

Sander Land, Catherine Arnett • • (2025) • DOI: 10.48550/arXiv.2505.24689

Byte Pair Encoding (BPE) tokenizers, widely used in Large Language Models, face challenges in multilingual settings, including penalization of non-Western scripts and the creation of tokens with parti...

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Mingjie Liu, et al. • • (2025) • DOI: 10.48550/arXiv.2505.24864

Recent advances in reasoning-centric language models have highlighted reinforcement learning (RL) as a promising method for aligning models with verifiable rewards. However, it remains contentious whe...

Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs

Alexander K. Lew, et al. • • (2023) • DOI: 10.48550/arXiv.2306.03081

Even after fine-tuning and reinforcement learning, large language models (LLMs) can be difficult, if not impossible, to control reliably with prompts alone. We propose a new inference-time approach to...

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

Yang Chen, et al. • • (2025) • DOI: 10.48550/arXiv.2505.16400

Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of front...

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Yang Yue, et al. • • (2025) • DOI: 10.48550/arXiv.2504.13837

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and ...

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Shenzhi Wang, et al. • • (2025) • DOI: 10.48550/arXiv.2506.01939

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful approach to enhancing the reasoning capabilities of Large Language Models (LLMs), while its mechanisms are not yet well ...

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Zafir Stojanovski, et al. • • (2025) • DOI: 10.48550/arXiv.2505.24760

We introduce Reasoning Gym (RG), a library of reasoning environments for reinforcement learning with verifiable rewards. It provides over 100 data generators and verifiers spanning multiple domains in...

Learning to Model the World with Language

Jessy Lin, et al. • • (2024) • DOI: 10.48550/arXiv.2308.01399

To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple langua...

Hardware-Efficient Attention for Fast Decoding

Ted Zadouri, et al. • • (2025) • DOI: 10.48550/arXiv.2505.21487

LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decodi...

Large Language Diffusion Models

Shen Nie, et al. • • (2025) • DOI: 10.48550/arXiv.2502.09992

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre...

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Andrew Zhao, et al. • • (2025) • DOI: 10.48550/arXiv.2505.03335

Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR wo...

Visual Planning: Let's Think Only with Images

Yi Xu, et al. • • (2025) • DOI: 10.48550/arXiv.2505.11409

Recent advancements in Large Language Models (LLMs) and their multimodal extensions (MLLMs) have substantially enhanced machine reasoning across diverse tasks. However, these models predominantly rely...

Round and Round We Go! What makes Rotary Positional Encodings useful?

Federico Barbero, et al. • • (2025) • DOI: 10.48550/arXiv.2410.06205

Positional Encodings (PEs) are a critical component of Transformer-based Large Language Models (LLMs), providing the attention mechanism with important sequence-position information. One of the most p...

Byte Latent Transformer: Patches Scale Better Than Tokens

Artidoro Pagnoni, et al. • • (2024) • DOI: 10.48550/arXiv.2412.09871

We introduce the Byte Latent Transformer (BLT), a new byte-level LLM architecture that, for the first time, matches tokenization-based LLM performance at scale with significant improvements in inferen...

Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers

Tiberiu Musat • • (2025) • DOI: 10.48550/arXiv.2411.12118

In this paper, I introduce the retrieval problem, a simple yet common reasoning task that can be solved only by transformers with a minimum number of layers, which grows logarithmically with the input...

Scaling Laws for Precision

Tanishq Kumar, et al. • • (2024) • DOI: 10.48550/arXiv.2411.04330

Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise "precision-aware" scaling laws for b...

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

Liqun Ma, et al. • • (2024) • DOI: 10.48550/arXiv.2407.07093

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for the first time how to train a large-scale binary language model from scratch (not the partial binary or ternary L...

Learning to Reason for Long-Form Story Generation

Alexander Gurung, Mirella Lapata • • (2025) • DOI: 10.48550/arXiv.2503.22828

Generating high-quality stories spanning thousands of tokens requires competency across a variety of skills, from tracking plot and character arcs to keeping a consistent and engaging style. Due to th...

Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages

Jannik Brinkmann, et al. • • (2025) • DOI: 10.48550/arXiv.2501.06346

Human bilinguals often use similar brain regions to process multiple languages, depending on when they learned their second language and their proficiency. In large language models (LLMs), how are mul...

Layers at Similar Depths Generate Similar Activations Across LLM Architectures

Christopher Wolfram, Aaron Schein • • (2025) • DOI: 10.48550/arXiv.2504.08775

How do the latent spaces used by independently-trained LLMs relate to one another? We study the nearest neighbor relationships induced by activations at different layers of 24 open-weight LLMs, and fi...

What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

Qiyuan Zhang, et al. • • (2025) • DOI: 10.48550/arXiv.2503.24235

As enthusiasm for scaling computation (data and parameters) in the pretraining era gradually diminished, test-time scaling (TTS)—also referred to as “test-time computing”—has emerged as a prominent re...

TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems

Khang H. N. Vo, et al. • • (2025) • DOI: 10.48550/arXiv.2503.06380

This paper focuses on multimodal alignment within the realm of Artificial Intelligence, particularly in text and image modalities. The semantic gap between the textual and visual modality poses a disc...

History, Development, and Principles of Large Language Models-An Introductory Survey

Zichong Wang, et al. • • (2024) • DOI: 10.48550/arXiv.2402.06853

Language models serve as a cornerstone in natural language processing (NLP), utilizing mathematical methods to generalize language laws and knowledge for prediction and generation. Over extensive rese...

Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

Zicheng Lin, et al. • • (2025) • DOI: 10.48550/arXiv.2411.19943

Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept o...

Training Large Language Models to Reason in a Continuous Latent Space

Shibo Hao, et al. • • (2024) • DOI: 10.48550/arXiv.2412.06769

Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. Ho...

Do Llamas Work in English? On the Latent Language of Multilingual Transformers

Chris Wendler, et al. • • (2024) • DOI: 10.48550/arXiv.2402.10588

We ask whether multilingual language models trained on unbalanced, English-dominated corpora use English as an internal pivot language -- a question of key importance for understanding how language mo...

On the Emergence of Thinking in LLMs I: Searching for the Right Intuition

Guanghao Ye, et al. • • (2025) • DOI: 10.48550/arXiv.2502.06773

Recent advancements in AI, such as OpenAI’s new o models, Google’s Gemini Thinking model, and Deepseek R1, are transforming LLMs into LRMs (Large Reasoning Models). Unlike LLMs, LRMs perform thinking ...

Computer Science - Computation and Language

Subcategories

Papers